This paper studies how to flexibly integrate reconstructed 3D models into practical 3D modeling pipelines such as 3D scene creation and rendering. Due to the technical difficulty, one can only obtain rough 3D models (R3DMs) for most real objects using existing 3D reconstruction techniques. As a result, physically-based rendering (PBR) would render low-quality images or videos for scenes that are constructed by R3DMs. One promising solution would be representing real-world objects as Neural Fields such as NeRFs, which are able to generate photo-realistic renderings of an object under desired viewpoints. However, a drawback is that the synthesized views through Neural Fields Rendering (NFR) cannot reflect the simulated lighting details on R3DMs in PBR pipelines, especially when object interactions in the 3D scene creation cause local shadows. To solve this dilemma, we propose a lighting transfer network (LighTNet) to bridge NFR and PBR, such that they can benefit from each other. LighTNet reasons about a simplified image composition model, remedies the uneven surface issue caused by R3DMs, and is empowered by several perceptual-motivated constraints and a new Lab angle loss which enhances the contrast between lighting strength and colors. Comparisons demonstrate that LighTNet is superior in synthesizing impressive lighting, and is promising in pushing NFR further in practical 3D modeling workflows. Project page: https://3d-front-future.github.io/LighTNet .
translated by 谷歌翻译
With the rapid development of deep generative models (such as Generative Adversarial Networks and Auto-encoders), AI-synthesized images of the human face are now of such high quality that humans can hardly distinguish them from pristine ones. Although existing detection methods have shown high performance in specific evaluation settings, e.g., on images from seen models or on images without real-world post-processings, they tend to suffer serious performance degradation in real-world scenarios where testing images can be generated by more powerful generation models or combined with various post-processing operations. To address this issue, we propose a Global and Local Feature Fusion (GLFF) to learn rich and discriminative representations by combining multi-scale global features from the whole image with refined local features from informative patches for face forgery detection. GLFF fuses information from two branches: the global branch to extract multi-scale semantic features and the local branch to select informative patches for detailed local artifacts extraction. Due to the lack of a face forgery dataset simulating real-world applications for evaluation, we further create a challenging face forgery dataset, named DeepFakeFaceForensics (DF^3), which contains 6 state-of-the-art generation models and a variety of post-processing techniques to approach the real-world scenarios. Experimental results demonstrate the superiority of our method to the state-of-the-art methods on the proposed DF^3 dataset and three other open-source datasets.
translated by 谷歌翻译
We have a Christmas gift for Harry Potter fans all over the world. In this paper, we present Harry Potter Dialogue (HPD), a dataset that helps train Harry Potter-like dialogue agents. Such a task is typically viewed as a variant of personalized dialogue agents, but they differ significantly in three respects: 1) Harry lived in a virtual world of wizards, thus, real-world commonsense may not apply to Harry's conversations; 2) Harry's behavior is strongly linked to background information in conversations: the scene, its attributes and its relationship to other speakers; and 3) Such backgrounds are dynamically altered as the storyline goes on. The HPD dataset, as the first dataset to facilitate the study of dialogue agent construction for characters within a story, provides rich contextual information about each dialogue session such as scenes, character attributes, and relations. More importantly, all the background information will change over the course of the story. In addition, HPD could support both dialogue generation and retrieval tasks. We evaluate baselines such as Dialog-GPT and BOB to determine the extent to which they can generate Harry Potter-like responses. The experimental results disappoint us in that although the generated responses are fluent, they still seem out of character for Harry. Besides, we validate the current most robust dialogue agent, ChatGPT, which also can't generate plausible Harry-Potter-like responses in some cases, either. Our results suggest that there is much scope for future research.
translated by 谷歌翻译
Video super-resolution is one of the most popular tasks on mobile devices, being widely used for an automatic improvement of low-bitrate and low-resolution video streams. While numerous solutions have been proposed for this problem, they are usually quite computationally demanding, demonstrating low FPS rates and power efficiency on mobile devices. In this Mobile AI challenge, we address this problem and propose the participants to design an end-to-end real-time video super-resolution solution for mobile NPUs optimized for low energy consumption. The participants were provided with the REDS training dataset containing video sequences for a 4X video upscaling task. The runtime and power efficiency of all models was evaluated on the powerful MediaTek Dimensity 9000 platform with a dedicated AI processing unit capable of accelerating floating-point and quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 500 FPS rate and 0.2 [Watt / 30 FPS] power consumption. A detailed description of all models developed in the challenge is provided in this paper.
translated by 谷歌翻译
从深度学习的迅速发展中受益,许多基于CNN的图像超分辨率方法已经出现并取得了更好的结果。但是,大多数算法很难同时适应空间区域和通道特征,更不用说它们之间的信息交换了。此外,注意力模块之间的信息交换对于研究人员而言甚至不太明显。为了解决这些问题,我们提出了一个轻量级的空间通道自适应协调,对多级改进增强网络(MREN)。具体而言,我们构建了一个空间通道自适应协调块,该块使网络能够在不同的接受场下学习空间区域和渠道特征感兴趣的信息。此外,在空间部分和通道部分之间的相应特征处理级别的信息在跳跃连接的帮助下交换,以实现两者之间的协调。我们通过简单的线性组合操作在注意模块之间建立了通信桥梁,以便更准确,连续地指导网络注意感兴趣的信息。在几个标准测试集上进行的广泛实验表明,我们的MREN在具有很少数量的参数和非常低的计算复杂性的其他高级算法上实现了优越的性能。
translated by 谷歌翻译
在本文中,我们提出了一种真正的群体级对比度视觉表示学习方法,其在Imagenet上的线性评估表现超过了香草的监督学习。两个主流的无监督学习方案是实例级对比框架和基于聚类的方案。前者采用了极为细粒度的实例级别歧视,由于虚假负面因素,其监督信号无法有效。尽管后者解决了这一点,但它们通常会受到影响性能的一些限制。为了整合他们的优势,我们设计了烟雾方法。烟雾遵循对比度学习的框架,但取代了对比度单元,从而模仿了基于聚类的方法。为了实现这一目标,我们提出了同步执行特征分组与表示学习的动量分组方案。通过这种方式,烟雾解决了基于聚类的方法通常面对的监督信号滞后问题,并减少了实例对比方法的错误负面因素。我们进行详尽的实验,以表明烟雾在CNN和变压器骨架上都很好地工作。结果证明,烟雾已经超过了当前的SOTA无监督的表示方法。此外,其线性评估结果超过了通过香草监督学习获得的性能,并且可以很好地转移到下游任务。
translated by 谷歌翻译
分层分类旨在将对象对类别的层次进行。例如,可以根据订单,家庭和物种的三级层次分类来分类鸟类。现有方法通过将其解耦为几个多级分类任务来常见地解决分层分类。但是,这种多任务学习策略未能充分利用不同层次结构的各种类别之间的相关性。在本文中,我们提出了基于深度学习的统一概率框架的标签层次转换,以解决层次分类。具体地,我们明确地学习标签层次转换矩阵,其列向量表示两个相邻层次结构之间的类的条件标签分布,并且可以能够编码嵌入类层次结构中的相关性。我们进一步提出了混淆损失,这鼓励分类网络在训练期间学习不同标签层次结构的相关性。所提出的框架可以适用于任何现有的深网络,只有轻微的修改。我们尝试具有各种层次结构的三个公共基准数据集,结果证明了我们的方法超出现有技术的优势。源代码将公开可用。
translated by 谷歌翻译
图表卷积网络(GCNS)已经实现了最近处理各种图形结构数据的显着学习能力。通常,由于传统GCNS中的图形卷积是Laplacian平滑的特殊形式,因此,Deep GCN不起作用很好,因此使不同节点的表示无法区分。在文献中,在GCN中采用多尺度信息来增强GCN的表现力。但是,过度平滑现象作为GCN的关键问题仍有待解决和调查。在本文中,我们通过将自我注意机制和多尺度信息结合到GCNS设计中,提出了两种新的多尺度GCN框架。我们的方法大大提高了GCNS模型的计算效率和预测准确性。对两个节点分类和图表分类的广泛实验证明了几种最先进的GCNS的有效性。值得注意的是,提出的两个架构可以有效地减轻GCN的过平滑问题,而我们的模型层甚至可以增加到64美元。
translated by 谷歌翻译
图表卷积网络(GCN)是一种强大的模型,在各种图形结构数据学习任务中逐渐研究。然而,为了减轻过平滑的现象,并处理异构图形结构数据,GCN模型的设计仍然是要调查的重要问题。在本文中,我们通过利用堆叠和聚合的思想提出一种名为SSTAGCN(简化堆叠的GCN)的新型GCN,这是用于解决异构图数据的自适应一般框架。具体来说,我们首先使用堆叠的基础模型来提取图形的节点特征。随后,采用诸如平均值,关注和投票技术的聚合方法来进一步增强节点特征提取的能力。此后,节点特征被认为是输入并馈入vanilla GCN模型。此外,明确地解析了所提出的模型的理论泛化结合分析。广泛的3美元公共引用网络和另外3美元的异质表格数据进行了广泛的实验,证明了拟议的艺术技术的效果和效率。值得注意的是,所提出的SSTAGCN可以有效地减轻GCN的过平滑问题。
translated by 谷歌翻译
随着深度学习技术的快速发展,各种最近的工作试图应用图形神经网络(GNN)来解决诸如布尔满足(SAT)之类的NP硬问题,这表明了桥接机器学习与象征性差距的潜力。然而,GNN预测的解决方案的质量并未在文献中进行很好地研究。在本文中,我们研究了GNNS在学习中解决最大可满足性(MaxSAT)问题的能力,从理论和实践角度来看。我们构建了两种GNN模型来学习来自基准的MaxSAT实例的解决方案,并显示GNN通过实验评估解决MaxSAT问题的有吸引力。我们还基于算法对准理论,我们还提出了GNNS可以在一定程度上学会解决MaxSAT问题的影响的理论解释。
translated by 谷歌翻译